DOCOHENT RESOiE 



ED 091 409 



TH 003 620 



AUTHOR 
TITLE 

PUB DATE 
NOTE 



EDRS PRICE 
DESCRIPTORS 



Oles^ Henry J. 

Stability of Student Evaluations of Instructors and 
Their Courses. 
Apr 74 

17p.; Paper presented at American Educational 
Research Association Annual Meeting (Chicago , 
Illinois, April 15-19, 1974) 

HF-$0.75 HC-$1.50 PLUS POSTAGE^ 
College Instruction ; College Students ; ♦Course 
Evaluation; ♦Reliability; Student Attitudes; Student 
Evaluation; ♦Teacher Evaluation ; Validity 



ABSTRACT 

A course-instructor e 
adapted for this study, was adniniste 
large and small section introductory 
meeting and again near the end of the 
posttest correlation was ♦.60. Althou 
changes, students were generally more 
and instructor at the end of the seme 
beginning. As a separate portion of t 
deliberately attempted to alter their 
large section of their introductory p 
there vas a significant overall mean 
experimental and control groups on th 
was no difference on the end of the s 
of this study indicate that students 
judgments of their instructors and co 
their judgments as warranted by chang 



valuation form, specif ica 3 ly 
red to 775 undergraduates in 15 
courses after the second class 

semester. The median pretest 
gh there were many systematic 

negative toward their course 
ster than they were at the 
his project, two instructors 

students* evaluation in one 
sychology course. In both cases, 
difference between the 
e initial evaluation but there 
emester evaluation. The results 
form reasonably lasting 
urses but are also able to alter 
ing situations. (Author) 



ERIC 



c:> 



i 06PAHrM6Nr 0»t M6AL t» 

^'AnONAl INUITUFE Of 
COUCATiON 

""^ ■ MM S I, f P 

^ i I ssA«,, r ,;; 



o 



CO 

o 



ERIC 



ABSTRACT 

STABILITY OP STUDENT EVALUATIONS OF INSTIUJCTORS Al^D TliriU CUUUSr,S 

A course-instructor evaluation form, specifically adapted Tor tliis study, 
was administered to 775 undergraduates in 15 large and small r.f.*ction intro- 
ductory courses after the second class meoting and again near tl;e end of the 
semester. The median pretest posttest correlation was +.60. Although there 
were many systematic changes, students were generally more nogaLivc toward 
their course and instructor at the end of the semestor tlian they were at the 
beginning. 

As a separate portion of this project, two instructors d»»l iboratoly 
attempted to alter their students' evaluation in one large soction of their 
introductory psychology course. In botli cases, there war> a significant over- 
all mean difference between the experimental and control groups on the initial 
evaluation but there was no difference on the end of semester evaluation. 

The results of this study indicate that students form rcaLonably lasting 
judgments of their instructors and courses but are also able to alter their 
judgments as warranted by changing situations. 



ERIC 



INTRODUCTIOM 



The use of student evaluations of faculty and coursec is now common on 
most college campuses • In many cases, the evaluative information is being 
published for review by students and is being used by administrators as a 
basis for tenure and promotion decisions. Although many arguiuonts have been 
made against using student evaluations as a primary criteria for professional 
advancement (Dressel, 1973), most of the research findings indicate that stu- 
dent run evaluations are reliable and reasonably valid indicants cf teacher 
performance (Costin, Greenough, and Mcngcs, 1971) • Regardless of the opinion 
of academia, student evaluations are being used at most institutions of 
higher education for a wide variety of i^urposes. Therefore, it is of prime 
importance to continue to conduct research on student evaluations to deter- 
mine and improve their reliability, validity, and utility. 

The major thrust of current research efforts is to determine what char- 
acteristics of the instructor and his course # students are actually attempting 
to assess and the degree to which these ratings are valid. A number of studies 
beginning with Remmers in 1928 have attempted to identify correlations between 
student characteristics, expected grade, and ratings. Although thd results 
have been at times conflicting, they generally show little or no relationship 
between expected grade and instructor ratings nor are there many meaningful 
significant relationships between other student/teacher characteristics and 
ratings (Costin, Greenough, and Mengos, 1971) . 

Several relatively recent studies have attempted to determine the relation' 
ship between ratings made while the course was in progress with those made at 
the end of the course (Dick, 1967, Costin, 1968, Stallings & Spencer, 1967), 



2 

Bausell & Magoon (1972) # in a paper prcsonted at the annual meeting o£ the 
American Educational Research AsGociation, reported a median correlation of 
.67 between ratings made at the end of the first class period with those made 
at the end of the scmoster. This finding wa5 particularly concerning to this 
researcher since it could indicate that students enter a course with a defi- 
nite predisposed and unalterable set of feelings about the course and instruc- 
tor or that they quickly form a rigid and lasting set of attitudes after only 
minimal exposure. 

This study was designed to both replicate and expand upon the work of 
Bausell and Magoon. In their study, the subjects were undergraduate and 
graduate students in 20 courses with a median class size of 15 and a range 
from 9 to 33 students. It was felt that their use of upper level undergrad- 
uates and graduate students , who already may have developed strong preconcep- 
tions of teachers and college courses in general, and the unusually small class 
sizes, may have biased their results. In addition, Bausell and Magoon adminis- 
tered the same questionnaire, the standard University of Delaware student 
evaluation form, for both the pre- and posttests. This researcher has found 
in a pilot study that undergraduate students vehimently object to evaluating 
a teacher or course after the first or second class day using a form that was 
obviously designed for use at the end of the semester. In one class, more than 
a third of the students refused to cooperate while the responses of another 
third were, at best, questionable. Many of the students indiscriiiiinantly 
filled the entire form with either highest or high average ratings. Therefore! 
the form used in this study was designed to overcome student objections by 
carefully wording the directions to the respondent on the pretest, stressing 
the fact that the form was specifically designed to measure their first Imgres- 

:RIC 

of the instructor and his course* In addition, each of the questions were 



3 

worded to make them appropriate for a firct impression evaluation. The 
posttest was essentially the same as the i^rctcst with only minor changes in 
tense (i.c.f whereas the pretont stated *'This teacher seems to be..." the 
posttest stated *'This teacher was...'*). 

Virtually all research n student evaluations has been conducted after 
the fact with very few attempts at deliberate experimental manipulation of 
ratings except through providing feedback to an instructor about his ratings 
(Aleamoni, 1972, Oles and Lencoski, 1973). As a subsequent portion of this 
study, this researcher and a colleague each taught two essentially identical 
sections of introductory psychology with approximately 125 students in each 
section. A deliberate attempt was made to create a negative first impression 
in the experimental section by beginning the course with an unusually dry 
lecture on the historical roots of the science of psychology and the methods 
of science to determine whether this treatment would alter the student* ratings. 
If a variation in instructor performance is reflected in student ratings in 
the direction intuitively expected, the results would add to the construct 
validity of student ratings in general since many skeptics have insisted that 
student ratings are not directly related to any actual teacher behavior other 
than theatrics. 

METHODOLOGY 

Subjects 

The subjects for this study originally included 1302 undergraduate students 
at Southwest Texas State University enrolled in 15 lower division courses taught 
by 13 different instructors with class sizes ranging from 17 to 154. Approxi- 
mately 50% of the subjects were classified as freslimen. 



The instrument used to gather student evaluations of their instructor a. 
and course consisted of 22 evaluative items (21 on the pretest) covering 
various diitiimsions of instructor pcrfomancci and the course in general. 
Additional items were included to obtain respondent biographic information. 
The form was a modification of an instrument originally designed by repre- 
sentatives of the student government and the faculty for voluntary use on this 
campus. The wording of the directions to the student and the questions wore 
carefully altered to make the task of rating the instructor and course after 
only the second day of class appear to be legitimate. In a pilot study con- 
ducted previously on this campus using a standard unaltered end of course 
evaluation form, a significant proportion of the students refused to respond 
even though they were verbally told that tliey were to report their first 
impressions. There was no problem in getting students to respond to the 
altered form pretest which was obviously specifically structured to assess 
first impressions. 

Procedure 

During the second or the beginning of the third class meeting, all stu- 
dents in 15 undergraduate courses were asked to complete the first impression 
instructor/course racing form. The instructor was asked to leave the room 
while the forms were distributed by a graduate student. Each group was clearly 
told that the purpose of the first impresfdon rating form was to help improve 
the design of student evaluation forms. They were reminded several times that 
their instructor would not see the ratings until after the semester was complete 
and grades were subnltted. Essentially identical instructions were given with 
the posttest which was administered by the same person during the last week of 



tho scmofjter, Tho «tuaont:i; wrru not luhl <u.ouL tho posttost wlien they took 
the pretoat. For poi:5ni.blo future .idunlL l! ication, without revealing thoir real 
identity, the litudents wtro told to Ui;c thvii mother's maiden nan\c or a ficti- 
tious namG thoy would not be likely to runjut. In this way, they could remain 
certain that thoir rofu^on'ios could not traced directly to them. This was 
essentially the same procedure followed by Bauisell and Magoon. 

Thirteen instructors agreed to participate in the study. Two instructors 
each taught two essentially identical sections of introductory psychology. 
Their normal approach to beginning the introductoi. / course was quite different. 
One instructor (instructor A) used several interest arousing leccures while the 
other (instructor B) plunged in the first day with an admittedly dry, at least 
in terms of student interest, lecture on tho methods of science and historical 
perspectives in psychology. Each instructor agreed to attenpt to alter their 
behavior in one class to match that of his colleague. This resulted in tw 
classes that received a high interest introductory lecture and two classes that 
received a rather low interest lecture. Tho instructors were then told to 
continue the semester after the second class day with their standr.rd style of 
teaching. Both instructors later reported that they had forgotten which of 
their two sections had received the atypical treatment. 

Ideally, this portion of the study tihould have been extended to a signifi- 
cantly larger number of instructors and courses. However, this reseacher was 
concerned about the moral and ethical obligations of every teacher to do his best 
in teaching his courses. Therefore, the decision was made to use deliberate 
modification of normal teaching practice in only two highly controlled situa- 
tions even though this decision would result in some questioning of the valid- 
ity and generalizability of the findings. It is unlikely^ but nevertheless 



6 

posslbl6| that a teacher who dcllboratoly makes his lecturoB unintorestlngi 
or even boring i simply for an expciimontal manipulation o£ a group o£ students # 
may unavoidably and unknowingly encourage a student to drop the course for this 
reason alone. 

Two threats to the internal validity of the procedures employed in this 
investigation are the possible of foctc of having taken the pretcot on po st 
performance and the students familarity with the instructor before the first 
class meeting. Bausell and Magoon specifically examined their data for pre- 
test sensitization and found none. No similar test was performed in this 
study, however, obfiervation of student reactions to the posttcct indicated that 
they had virtually forgotten having taken the pretest three months previously. 
None of the students had had any previous classroom contact with the instruc- 
tor since all of the courses were introductory, however, there is no way to 
avoid the "campus grapevine." 

RESULTS 

Pretest - Posttesr Comparisons 

Of the 1302 students who took the pretest, 775 were matched with their 
posttest ratings* Approximately 40% of the subjects were lost because of 
absences, withdrawals, incomplete forms, and inability to match the two forms. 

Table 1 presents the percentage of subjects selecting each response 
option for 21 items on the pretest and posttest and for one item found only 
on the poattest. The most interesting finding is the largo proportion of 
students who chose the most favorable rcnponne options, 0 and 1. Response 
option 2 was rarely selected for most items while option 3 responses wore 



ERIC 



7 

CflsentialJy nonexiattmt, eupocLally on t.ho i talent. Students evidently are 
inclined to givo poaitivo ratimjs i?von to relntively poor teachcrc. This 
finding Is in agroomont \/Ltli a roport huuUj by Contra (1973). The actual 
reporting of unujiualiy high inntructor uourso ratings is in direct contrast 
with the findings of Cotstln, Crconoucjh and Mongoss (1971). The student sub- 
jects ir their study overwhelmingly stated that they would not rate college 
teachers in general higher than they deserve ^ because there are so many bad 
teachers and so few really good ones. 

The mean ratings for each of the evaluative items was calculated for each 
c uss on the pretoct and posttost. Prctoist x^osttest mean ratings were signif- 
icantly different at Liie .05 level tor nine items. Students roxjorted signifi- 
cantly less interest in their course, expected a lower grade, found the text- 
book more objectionable, found teachers explanations more inadequate, lost 
some desire to attend class, saw less value in attending class, and thought 
the instructor wasted more class time at the end of the semester than at the 
beginning. However, students did see exonis and grading as being more fair at 
the end of the course than at the beginning even though many expected to receive 
considerably lower grades than they expected at the beginning. 

The median pre-posttest correlation for all 21 items was .GO and ranged 
from -.11 for the amount of information learned to +.86 for courr.e difficulty 
and attractiveness of the teacher's personality. Generally the obtained corre- 
lations were agreeable to reason. Those acx^ccts of the course that could 
potentially be reliably and validly assessed at the beginning were highly cor- 
related with posttest ratings, while those aspects that could conceivably be 
accurately rated only after several weeks of exposure showed low correlations. 



ERIC 



Q 

For oxajnplo, the students protest rating vi tho amount of maliurial learned 
depends on the form of the instructors iniiroduction to the course which may 
not be at all related to his later performance. 

Table 3 presents the results of a deliberate attempt by two instructors 
to alter their initial expected student ratincjij in two of the lour esiiontially 
equivalent sections of introductory psychology they taught. Instructor A 
teaches a life oriented course and normally begins with an interest arousing 
lecture and discussion on the misconceptions man has about human bcliavior. 
Instructor D teaches an experimentally oriented course and normally begins 
with a lecture on tho mntbodo of £3cionco and hiGtorical perspectives. Instruc- 
tors A and D each used the others approach as best they could for two class 
meetings in one of their two sections. Doth instructors reported having for- 
gotten during the semester which of their tv;o sections had received the atypical 
introduction. The difference between the moan pretest ratings for Instructor A 
were significant at the .02 level and for Instructor B, beyond tho .01 level. 
There were no differences on the posttest ratings for either instructor, thus 
showing that the students were able to alter their first impression ratings to 
fit the instructors typical performance shown throughout the semester. Although 
the differences in mean ratings between the interest and noninterect arousing 
introductory lectures were highly significant, the generalizability of this 
finding is low because of the small sample size (2 instructors, 4 sections) . 
However, this researcher believes that these findings are of critical importance 
in demonstra uing at least one aspect of tho validity of student evaluations and 
thus this portion of the study demands replication on a largrr scale ^ if c^loqurite 
ooiitrol can be maintained to protect those students who may bo inadvertantly 
negatively affected by unknowingly being part of the experimental group. 

ERIC 



9 



SUMMARY AND CONCLUriTOMS 

This investigation examined throe tinpcctrj of student rntinois of collogG 
instructors; tho distribution of ratings given after the first ur cocond 
class meeting and again during the last week of the semester; the correlation 
between mean pretest and posttost ratings for each item using riftcen classes; 
the effects of short term deliberate manipulation of teaching iJtyle on ratings « 
Tables 1 and 2 show that students in this study had a definite tendency to 
rate instructors positively on both the pretest and posttest although ratings 
on tho posttest were generally more negative and variable. Only one item, ex- 
pected course difficulty, exceeded the expected mean (1.5) on the protest. The 
overall mean for the combined 21 items on the pretest and posttest wore .62 and 
.72 respectively. It is likely that those classes that arc cpecifically instruct- 
ed to accurately rate their instructor relative to all other instructors they 
have had and considering all four options, would rate their teaciiers lower than 
those classes whose attention was not directed to specifically considering all 
the options. As a result of this tendency to rate all instructors positively, 
those institutions that passively permit some use of student evaluations without 
offering individual faculty members a statistical analysis of their ratings with 
respect to those of other member" of the department, school, or institution, may 
in actuality be promoting a false sense of satisfaction and security among faculty 
since individual faculty members may not bo aware of the students tendency to 
report above average ratings. Therefore, an unusually poor teacher in the eyes 
of the student may be smugly satisfied with his apparently average ratings which 
in fact, when compared with the ratings given his colleagues, may place htm at 
the bottom. Obviaously the reliability and differential validity of student 
O aluationa would be improved if techniques were used to encourage students to 

ERIC 



10 

realistically rate tho relative efCectiveneou of their teachers on a true 
four point scale. 

The median correlation between beginning and end of semester ratings 
was shown to be 4.60 with individual item correlations ranging from -.11 for 
amount of material learned to 4.86 for assessment of the course difficulty 
and the teachers perceived personality. Studontn generally viewed their 
instructor and courses as less interesting, expected a lower grade, found the 
textbook more objectionable, found the teachers explanations more inadequate, 
lost their desire to attend class, and thought the instructor wasted more 
time at the end of the semester than at the beginning. Interestingly, however, 
although students saw their instructors as more fair in constructing exams and 
grading at the end of the course than at the beginning, there was a highly 
significant drop in expected grade (t => 11.29). The results of this portion 
of the study demonstrates that students are able to form relatively lasting 
appraisals of their course and instructor after minimal exposure. The stability 
of the ratings listed in Table 3 were generally agreeable to reason. Although 
all characteristics of a course can be misjudged, those particular characteris- 
tics that would be expected to require meucimum exposure in order to make a 
rallstic judgment, indeed, showed the lowest pretest posttest correlations 
(I learned -.11; Tolerance to disagreement, .18; Intellectual stimulation, .20). 

The final portion of this study was designed to determine whether or not 
students in experimental and control groups would give significantly different 
ratings to teachers who alter their teaching style in two introductory psychol- 
ogy courses. Table 3 shows that indeed students rated the two ctyles of teaching 
differently. The life orientated, interest arousing, approach received signif- 
icantly hlghsr ratings, t ■ 2,83 and 4.97 respectively, than the non-interest 

ERIC 



11 

arouslngi basic science/historical approaclu Nearly all individuol ratings 
were more negative for the rigid non-intoroot approach in both experimental 
groups. There were no significant differences in the mean ratings at the 
end of the semester between the experimental and control groups for each 
instructor. 

The correlations between the mean item ratings in the experimental and 
control groups on the posttent for instructors A and B were .90 and .92 
respectively which serves as a measure of the reliability of the rating 
instrument across subjects. 

The generalizability of this portion of the study # however, is question- 
able because of the participation of only two instructors which was a result 
of this researchers concern for maintaining strict control and the ethical 
responsibility of an instructor to do his best, however he sees it, in a 
course. However, because of the highly significant results reported here, 
their importance to experimentally establishing the validity of student eval- 
uations, the fact that altered teaching behavior showed no lasting effects # 
this portion of the project should serve as a pilot study for repetition on 
a larger scale. 



ERIC 



TADLM I 

PERCENTAGL Or STUDKNTS SELECTING hhCU FESPONSE OPTION 



ITEM 


IlEG 
0 


1 


OPTION 
2 


PRliTEST 
3 4 


lUiSPONtji; 

0 1 


OPTION 
2 


POSTTEST 
3 4 


Interust in Course 


40 


48 


9 


2 0 


28 


44 


10 


7 2 


Course Difficulty 


4 


39 


47 


9 0 


8 


43 


40 


9 1 


My Grade 


22 


62 


15 


0 0 


9 


40 


42 


9 1 


Textbook 


17 


47 


32 


5 0 


17 


36 


32 


12 4 


Course Organization 


39 


57 


3 


0 


40 


53 


6 


1 


Teachers Knowledge 


79 


21 


0 


0 


7B 


21 


1 


0 


Teachers Attitude Toward Course 


67 


30 


2 


1 


G4 


32 


3 


.1 


Teachers Explanations 


58 


37 


5 


0 


50 


39 


9 


1 


Intellectual Stimulation 


24 


66 


10 


0 


19 


60 


20 


1 


Speaking Ability 


68 


30 


2 


0 


63 


33 


3 


1 


Teachers Attitude Toward Students 


53 


30 


IG 


1 


56 


32 


11 


1 


Grading Fairness 


18 


79 


3 


0 


53 


42 


4 


1 


Tolerance to Disagreement 


58 


39 


2 


0 


54 


41 


3 


2 


Teachers Personality 


59 


36 


3 


2 


57 


38 


3 


2 


Overall Rating 


20 


' 47 


32 


2 


24 


47 


25 


4 


Desire to Attend Class 


58 


40 


1 


1 


37 


54 


7 


2 


Value of Attendance 


96 


4 


1 




01 


15 


4 




Utilization of Time 


80 


19 


1 


0 


70 


25 


4 


1 


Amount Learned 


64 


34 


2 




47 


45 


8 




Satisfaction With Course 


70 


26 


4 




77 


17 


6 




Sticks to Subject 


66 


32 


2 




61 


35 


4 




Recommend to Friends (post test only) 










58 


29 


10 


3 



ERIC 



TABLL 2 

t 

MEANS , STANDARD DEVIATIONS, t TKSTS AND CORilELATION BETWEEN VIXE AND POSTTEST 

MEAN RATINGS 





ITEM 




I'OGTTEST 
M 


SD 




t 


r (N 


1. 


Intcrost In Course 


.Q4 


.40 


1.23 


.59 


+ .39 


3.32** 


.64 


2. 


Course Difficulty 


1.52 


.32 


1.43 


.29 


-.09 


.79 


.86 


3. 


My Grade 


.95 


.19 


1.58 


.28 


+ .63 


11.29** 


.63 


4. 


Textbook 


1.25 


.25 


1.70 


.57 


+ .45 


4.14** 


.75 


5. 


Course Organization 


.68 


.15 


.73 


.29 


+ .05 


.84 


.60 


6. 


Teachers Knowledge 


.22 


.11 


.24 


.15 


+ .02 


.78 


.63 


7. 


Teachers Attitude Toward Course 


.42 


.33 


.44 


.33 


+ .02 


.30 


.72 


8. 


Teachers Explanations 


.51 


.21 


.69 


.35 


+ .18 


2.15* 


.42 


9. 


Intellectual Stimulation 


.92 


.22 


1.03 


.22 


+ . V 






10. 


Speaking Ability 


.40 


.26 


.49 


.34 


+ .09 


.83 


.71 


11. 


Teachers Attitude Toward Students 


.71 


.37 


.62 


.29 


-.09 


1.14 


.57 


12. 


Grading Fairness 


.82 


.12 


.58 


.25 


-.24 


-4.47* 


.63 


13. 


Tolerance to Disagreement 


.54 


.37 


.56 


.25 


+ .02 


.19 


.18 


14. 


Teachers PeTSonality 


.63 


.44 


.64 


.41 


+ .01 


.24 


.86 


15. 


Overall Rating 


1.21 


.46 


1.16 


.45 


-.05 


.44 


.60 


16. 


DesiriB to Attend Class 


.43 


.15 


.71 


.25 


+ .20 


5.46** 


.60 


17. 


Value of Attendance 


.20 


.42 


.33 


.42 


+ .13 


2.04* 


.83 


18. 


Utilization of Time 


.19 


.12 


.33 


.22 


+ . 14 




. DO 


19. 


Amount Learned 


.49 


.25 


.65 


.32 


+ .16 


1.64 


-.11 


20. 


Satisfaction With Course 


• 43 


.30 


.37 


.33 


-.06 


.74 


.53 


21. 


Sticks to Subject 


.34 


.23 


.41 


.19 


+ .07 


1.19 


.33 


Total Overal Rating - All 21 Scales 
Combined 


.62 


.37 


.72 


.40 


+ .10 


2.70* 


.90 



• Significant at .05 
snificant at .01 
^l^^^wer Mean • More Positive Rating 

:j-;(r/. .,..1. : ,v-. ;k - „. ...... ■ . .. 



TABLE 3 

COMPARISON OF PRE AND P0STTES7' RATINGS WIIEIJ INSTRUCTORS DELIBERATELY ALTER 

THEIR NORMAL TEACHING STYLE 



INSTRUCTOR A 



PRETEST PHETEr.T POSTTEST POSTTEST 

(interest arousing) (no inter(;st) (interest arousing) (no interest) 

M = .48 M = .63 t = 2.83* M « .58 M = .59 t = .3i 

SD .36 SD = .33 r = .00 SD .36 SD = .39 r = .98 



INSTRUCTOR B 



PRETEST PRETEST POSTTEST POSTTEST 

(interest arousing) {no interest) (interest arousing) (no interest) 

M = .60 M = .95 t = 4.97** M = 1.16 M » i.20 t « .72 

SD « .40 SD « .42 r = .73 SD = .52 SD « •40 r « .92 



* Significant at .02 
** Significant at .01 



ERIC 



Alemoni, L. The usefulness of student cvnlualions in improving colloge 

teaching. In, Procoedings , The Firsb Invitational Conf oronco On Faculty 
Effectiveness As Evaluated By Studont r, , Alan L. Sockloff, oditor, 
Measurement and Evaluation Center, Temple University. Philadelphia, Pa,, 
1973. 

Bausell, R. D. and Magoon, J. The persistanco of first imprejssions in course 
and instructor evaluation. Unpublished paper, American Educational 
Research Association, 1972. 

Centra, J. The student as godfather? The impact of student ratings on academia. 
In, Proceedings , The First Invitational Conference On Fac\ilty Effectiveness 
As Evaluated By Students , Alan L. Sockloff, editor. Measurement and 
Evaluation Center, Temple University. Philadelphia, Pa., 1973. 

Costin, F. A graduate course in the teaching of psychology: description and 
evaluation. Journal of Teacher Education , 1968, 1£, 425-432. 

Costin, F., Greenough, W. , Menges, R. Student ratings of college teaching; 
reliability, validity, and usefulness. Review of Educational Research , 
1971, 41., 511-535. 

Dick, W. Course attitude questionnaire : its development , uses , and research 

results . University Division of Instructional Services. The Pennsylvania 
State University, Report No. 106 Revised by D. Stickell, September, 1967, 
(mimeographed) 

Dressel, P. Student evaluation of faculty: V/hy? What? How? In, Proceedings , 
The First Invitational Conference On Faculty Effectiveness As Evaluated 
By Students , Alan L. Sockloff, editor. Measurement and Evaluation Center, 
Temple University. Philadelphia, Pa., 1973. 

Oles, H., Lencoski, A. Changes in instructor's selfrating resulting from 
feedback from student evaluations. Catalog of Selected Documents in 
Psychology . 

Rcmmers, H. The relationship between students' marks and students' attitudes 
toward their instructors. School and Society. 1928, £8, 759-760. 

Stallings, W.M. and Spencer, R. E. Ratings of ! tors in Accounting 

101 from video-tape clips. Research Repci >, Measurement and 

Research Division, Office of Infetructional Res es, University of 
Illinois, 1967. 



ERIC 



